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Abstract 

This  project  focused  on  a  multifaceted  study  of  a  class  of  cluster-detection  problems  arising 
in  biological  and  social  networks.  This  includes  defining  new  cluster  models  and  their  alter¬ 
native  mathematical  programming  formulations,  their  theoretical  analysis,  the  development  of 
exact  algorithms,  and  heuristics.  Originally,  clusters  (complexes,  modules,  cohesive  subgroups) 
in  biological  and  social  networks  were  described  by  cliques  (complete  subgraphs)  or  connected 
components.  However,  in  many  practical  situations  cliques  appear  to  be  overly  restrictive, 
whereas  connected  components  are  insufficiently  “tight”  clusters.  This  project  considers  a  class 
of  concepts  describing  clusters  that  “relax”  the  definition  of  a  clique  and  are  tighter  than  con¬ 
nected  components.  Such  problems  are  of  great  practical  as  well  as  theoretical  interest. 

The  developed  approaches  are  based  on  representing  the  cluster-detection  problems  in  bio¬ 
logical  and  social  networks  in  terms  of  discrete  and  continuous  optimization  models.  In  partic¬ 
ular,  one  of  the  most  promising  directions  in  global  optimization  research  deals  with  continuous 
(nonconvex)  approaches  to  discrete  optimization  problems.  The  research  in  this  project  focused 
around  the  following  three  major  thrusts: 

1.  Theoretical  study  of  optimization  models:  (i)  analyzing  new  optimality  conditions  for 
discrete  optimization  problems  based  on  their  continuous  nonlinear  formulations;  (ii)  using 
nonlinear  formulations  of  discrete  problems  to  deduct  important  computational  complexity 
results  for  general  and  special  cases  of  continuous  optimization;  (iii)  obtaining  lower  and 
upper  bounds  based  on  nonlinear  formulations  that  can  be  utilized  in  exact  algorithms. 

2.  Algorithms  development  and  analysis:  (i)  developing  effective  exact  combinatorial  algo¬ 
rithms  for  the  problems  of  interest;  (ii)  designing  new  problem  scale-reduction  techniques 
and  graph  decomposition  methods  that  can  be  used  to  solve  special  cases  of  large-scale 
instances  arising  in  practical  situations  to  optimality;  (iii)  developing  and  analyzing  a  new 
metaheuristic  technique,  variable  objective  search. 

3.  Experimentation  and  application:  Implement  the  proposed  algorithms,  test  and  fine-tune 
the  developed  software  using  a  variety  of  randomly  generated  problems  and  instances 
available  in  the  public  domain,  and  apply  the  software  to  real-life  biological  and  social 
data. 

The  significance  of  clique  and  related  concepts  in  the  development  of  graph  theory,  the¬ 
oretical  computer  science  and  optimization,  including  computational  complexity  theory  and 
semidefmite  programming,  has  been  well  documented.  The  models  developed  in  this  project 
provide  systematic  relaxations  of  the  concept  of  clique  and  can  potentially  impact  a  number  of 
important  application  areas. 


1 


1  Summary  of  Research  Contributions 

The  research  contributions  resulting  from  the  project  are  summarized  in  the  following  subsections. 
Namely,  we  list  abstracts/brief  summaries  of  the  papers  supported  by  this  project  that  have  been 
finalized  and  are  published/accepted  for  publication  or  submitted/ready  for  submission. 

In  [1],  we  introduce  and  study  the  maximum  k-plex  problem,  which  arises  in  social  network  anal¬ 
ysis,  but  can  also  be  used  in  several  other  important  application  areas,  including  wireless  networks, 
telecommunications,  and  graph-based  data  mining.  We  establish  NP-completeness  of  the  decision 
version  of  the  problem  on  arbitrary  graphs.  An  integer  programming  formulation  is  presented  and 
basic  polyhedral  study  of  the  problem  is  carried  out.  A  branch- and-cut  implementation  is  discussed 
and  computational  test  results  on  the  proposed  benchmark  instances  and  real-life  scale-free  graphs 
are  also  provided. 

In  [2],  we  study  the  maximum  quasi-clique  problem,  which  defines  a  cluster  based  on  edge 
density.  Given  a  simple  undirected  graph  G  =  {V,  E)  and  a  constant  7  €  (0, 1),  a  subset  of  vertices 
is  called  a  7-quasi-clique  or,  simply,  a  7-clique  if  it  induces  a  subgraph  with  the  edge  density  of  at 
least  7.  The  maximum  7-clique  problem  consists  in  finding  a  7-clique  of  largest  cardinality  in  the 
graph.  Despite  numerous  practical  applications,  this  problem  has  not  been  rigorously  studied  from 
mathematical  perspective,  and  no  exact  solution  methods  have  been  proposed  in  the  literature. 
This  paper,  for  the  first  time,  establishes  some  fundamental  properties  of  the  maximum  7-clique 
problem,  including  the  NP-completeness  of  its  decision  version  for  any  fixed  7  satisfying  0  < 
7  <  1,  the  weak  heredity  property,  and  analytical  upper  bounds  on  the  size  of  a  maximum  7- 
clique.  Moreover,  mathematical  programming  formulations  of  the  problem  are  proposed  and  results 
of  preliminary  numerical  experiments  using  a  state-of-the-art  optimization  solver  to  find  exact 
solutions  are  presented. 

In  [3],  we  introduce  the  variable  objective  search  framework  for  combinatorial  optimization. 
The  method  utilizes  different  objective  functions  used  in  alternative  mathematical  programming 
formulations  of  the  same  combinatorial  optimization  problem  in  an  attempt  to  improve  the  solutions 
obtained  using  each  of  these  formulations  individually.  The  proposed  technique  is  illustrated  using 
alternative  quadratic  unconstrained  binary  formulations  of  the  classical  maximum  independent  set 
problem  in  graphs. 

In  [4]  we  deal  with  a  diameter-based  clique  relaxation  model.  Given  a  simple  undirected  graph 
G ,  a  A:-club  is  a  subset  of  vertices  inducing  a  subgraph  of  diameter  at  most  k.  The  maximum  k- 
club  problem  (M/cCP)  is  to  find  a  /c-club  of  maximum  cardinality  in  G.  These  structures,  originally 
introduced  to  model  cohesive  subgroups  in  social  network  analysis,  are  of  interest  in  network-based 
data  mining  and  clustering  applications.  The  maximum  /c-club  problem  is  NP-hard,  moreover, 
determining  whether  a  given  /c-club  is  maximal  (by  inclusion)  is  NP-hard  as  well.  This  paper  first 
provides  a  sufficient  condition  for  testing  maximality  of  a  given  /c-club.  Then  it  proceeds  to  develop 
a  variable  neighborhood  search  (VNS)  heuristic  and  an  exact  algorithm  for  M/cCP  that  uses  the 
VNS  solution  as  a  lower  bound.  Computational  experiments  with  test  instances  available  in  the 
literature  show  that  the  proposed  algorithms  are  very  effective  on  sparse  instances  and  outperform 
the  existing  methods  on  most  dense  graphs  from  the  testbed. 

Paper  [5]  analyzes  the  elementary  clique-defining  properties  implicitly  exploited  in  the  available 
clique  relaxation  models  and  proposes  a  taxonomic  framework  that  not  only  allows  to  classify  the 
existing  models  in  a  systematic  fashion,  but  also  yields  new  clique  relaxations  of  potential  practical 
interest.  Some  basic  structural  properties  of  several  of  the  considered  models  are  identified  that 
may  facilitate  the  choice  of  methods  for  solving  the  corresponding  optimization  problems.  In 
addition,  bounds  describing  the  cohesiveness  properties  of  different  clique  relaxation  structures  are 
established,  and  practical  implications  of  choosing  one  model  over  another  are  discussed. 
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In  [6],  we  exploit  the  heredity  property  of  two  of  the  clique  relaxation  models,  fc-plex  and  s- 
defective  clique,  coupled  with  effective  scale-reduction  procedures,  in  order  to  develop  effective  exact 
algorithms  for  these  models.  The  general  node  deletion  problem  can  be  stated  as  follows:  Given  a 
graph  property  II,  find  the  minimum  number  of  nodes  that  need  to  be  removed  from  the  graph  in 
order  to  obtain  an  induced  subgraph  satisfying  property  II.  In  1978,  Yannakakis  has  shown  that  if 
the  property  II  is  hereditary  on  induced  subgraphs,  nontrivial,  and  interesting,  the  resulting  node¬ 
deletion  problem  is  NP-hard.  Such  node-deletion  problems  have  numerous  practical  applications, 
including  social  network  analysis  and  graph-based  data  mining.  This  paper  proposes  an  exact 
approach  to  solve  the  node-deletion  problems  of  interest,  which  can  be  viewed  as  an  extension 
of  some  of  the  most  successful  exact  algorithms  for  the  classical  maximum  clique  problem.  The 
excellent  performance  of  the  approach  is  illustrated  through  a  comprehensive  computational  study 
for  the  maximum  £:-plex  problem  and  the  maximum  s-defective  clique  problems. 

In  [7],  we  explore  scale  reduction  techniques  that  use  the  knowledge  of  common  neighbors  to 
obtain  the  maximum  clique  on  very  large-scale  real  life  networks  (a  million  nodes) .  Analytically,  the 
technique  has  been  shown  to  be  very  effective  on  power-law  random  graphs.  Experimental  results 
on  graphs  from  the  SNAP  database  (Collaboration  networks,  P2P  networks,  Social  networks,  etc) 
show  our  procedure  to  be  much  more  effective  than  a  regular  peeling  approach,  helping  us  obtain 
the  maximum  clique  in  all  the  test  cases. 

2  Survey  Articles 

Publication  [8]  is  an  encyclopedia  article  that  introduces  the  closely  related  maximum  clique,  max¬ 
imum  independent  set,  graph  coloring,  and  minimum  clique  partitioning  problems.  The  survey 
includes  some  of  the  most  important  results  concerning  these  problems,  including  their  computa¬ 
tional  complexity,  known  bounds,  mathematical  programming  formulations,  and  exact  and  heuristic 
algorithms  to  solve  them.  Finally,  book  chapter  [9]  describes  the  origins  of  clique  relaxation  concepts 
arising  in  social  network  analysis  and  provides  a  brief  overview  of  their  mathematical  programming 
formulations,  algorithms  for  solving  the  corresponding  optimization  problems,  and  selected  real-life 
applications  of  the  models  of  interest. 
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